In [2]:
import pandas as pd
import seaborn as sns
import plotly.express as px

import matplotlib.pyplot as plt
In [3]:
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"

Matplotlib¶

For this excercise, we have written the following code to load the stock dataset built into plotly express.

In [4]:
stocks = px.data.stocks()
stocks.head()
Out[4]:
date GOOG AAPL AMZN FB NFLX MSFT
0 2018-01-01 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2018-01-08 1.018172 1.011943 1.061881 0.959968 1.053526 1.015988
2 2018-01-15 1.032008 1.019771 1.053240 0.970243 1.049860 1.020524
3 2018-01-22 1.066783 0.980057 1.140676 1.016858 1.307681 1.066561
4 2018-01-29 1.008773 0.917143 1.163374 1.018357 1.273537 1.040708

Question 1:¶

Select a stock and create a suitable plot for it. Make sure the plot is readable with relevant information, such as date, values.

In [5]:
# axes x and y
x = stocks["date"]
y = stocks["GOOG"]

# initiate plot with title and labels

fig, ax = plt.subplots(1,1,figsize=(10,8))
ax.plot(x,y)
ax.set_title("Google stock")
ax.set_xlabel("date")
ax.set_ylabel("stock value")

# set the ticks on the x axis
xticks = ax.get_xticks()
new_xticks = xticks[0::14]
ax.set_xticks(new_xticks)

# show plot
plt.show
Out[5]:
<function matplotlib.pyplot.show(close=None, block=None)>

Question 2:¶

You've already plot data from one stock. It is possible to plot multiples of them to support comparison.
To highlight different lines, customise line styles, markers, colors and include a legend to the plot.

In [6]:
# axes x and y
x = stocks["date"]
y1 = stocks["GOOG"]
y2 = stocks["AAPL"]
y3 = stocks["AMZN"]
y4 = stocks["FB"]
y5 = stocks["NFLX"]
y6 = stocks["MSFT"]

# initiate plots with title, labels, and legend
fig, ax = plt.subplots(1,1,figsize=(10,8))
ax.plot(x, y1, label = "GOOG")
ax.plot(x, y2, label = "AAPL")
ax.plot(x, y3, label = "AMZN")
ax.plot(x, y4, label = "FB")
ax.plot(x, y5, label = "NFLX")
ax.plot(x, y6, label = "MSFT")
ax.set_title("Stocks")
ax.set_xlabel("date")
ax.set_ylabel("stock value")
ax.legend()

# set the ticks on the x axis
xticks = ax.get_xticks()
new_xticks = xticks[0::14]
ax.set_xticks(new_xticks)

# show plot
plt.show
Out[6]:
<function matplotlib.pyplot.show(close=None, block=None)>

Seaborn¶

First, load the tips dataset

In [7]:
tips = sns.load_dataset('tips')
tips.head()
Out[7]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Question 3:¶

Let's explore this dataset. Pose a question and create a plot that support drawing answers for your question.

Some possible questions:

  • Are there differences between male and female when it comes to giving tips?
  • What attribute correlate the most with tip?
In [8]:
# Is there a correlation between day of the week and total bills paid?


df_thurs = tips[(tips['day'] == 'Thur')]
df_fri = tips[(tips['day'] == 'Fri')]
df_sat = tips[(tips['day'] == 'Sat')]
df_sun = tips[(tips['day'] == 'Sun')]

fig = plt.figure(figsize=(10,10))

gs = fig.add_gridspec(nrows=4, ncols=1)

ax = fig.add_subplot(gs[0,:])
ax.set_title('Thursday')
x = df_thurs['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color='g')

ax = fig.add_subplot(gs[1,:])
ax.set_title('Friday')
x = df_fri['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color='r')

ax = fig.add_subplot(gs[2,:])
ax.set_title('Saturday')
x = df_sat['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color='m')

ax = fig.add_subplot(gs[3,:])
ax.set_title('Sunday')
x = df_sun['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color = 'y')

fig.tight_layout()

# Joint plot
sns.jointplot(x='total_bill', y='day', data=tips)


plt.show()

Plotly Express¶

Question 4:¶

Redo the above exercises (challenges 2 & 3) with plotly express. Create diagrams which you can interact with.

The stocks dataset¶

Hints:

  • Turn stocks dataframe into a structure that can be picked up easily with plotly express
In [9]:
fig = px.line(stocks, x="date", y=stocks.columns[1:7])
fig.show()

The tips dataset¶

In [10]:
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color = "day")
fig.show()

Question 5:¶

Recreate the barplot below that shows the population of different continents for the year 2007.

Hints:

  • Extract the 2007 year data from the dataframe. You have to process the data accordingly
  • use plotly bar
  • Add different colors for different continents
  • Sort the order of the continent for the visualisation. Use axis layout setting
  • Add text to each bar that represents the population
In [11]:
#load data
df = px.data.gapminder()
df.head()
Out[11]:
country continent year lifeExp pop gdpPercap iso_alpha iso_num
0 Afghanistan Asia 1952 28.801 8425333 779.445314 AFG 4
1 Afghanistan Asia 1957 30.332 9240934 820.853030 AFG 4
2 Afghanistan Asia 1962 31.997 10267083 853.100710 AFG 4
3 Afghanistan Asia 1967 34.020 11537966 836.197138 AFG 4
4 Afghanistan Asia 1972 36.088 13079460 739.981106 AFG 4
In [12]:
# load 2007 data
df_2007 = df.query('year==2007')

# Sum up population by continent
df_sum = df_2007.groupby('continent').sum()

# Sort data into ascending order
df_sorted = df_sum.sort_values(by='pop',ascending=True)

# Create bar chart
fig = px.bar(df_sorted, x="pop", y=df_sum.index, 
             orientation='h', color = df_sum.index, 
             labels={'y':'continent','pop':'population'}, text = 'pop')
fig.show()